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Abstract. We study the performance and convergence properties of the Susceptibility Propagation (SusP) 
algorithm for solving the Inverse Ising problem. We first study how the temperature parameter (T) in 
a Sherrington-Kirkpatrick model generating the data influences the performance and convergence of the 
algorithm. We find that at the high temperature regime (T > 4), the algorithm performs well and its quality 
is only limited by the quality of the supplied data. In the low temperature regime (T < 4), we find that the 
algorithm typically does not converge, yielding diverging values for the couplings. However, we show that 
by stopping the algorithm at the right time before divergence becomes serious, good reconstruction can 
be achieved down to T ~ 2. We then show that dense connectivity, loopiness of the connectivity, and high 
absolute magnetization all have deteriorating effects on the performance of the algorithm. When absolute 
magnetization is high, we show that other methods can be work better than SusP. Finally, we show that 
for neural data with high absolute magnetization, SusP performs less well than TAP inversion. 

PACS. 7 5.10.Nr,02.50.Tt,05.10.-a 



1 Introduction 

Problems in Statistical Mechanics and those in Statisti- 
cal Inference can be thought of as being the inverses of 
each other. In statistical mechanics one is usually given a 
Gibbs distribution and is asked to compute moments of 
some observables. In statistical inference, one is given a 
set of observables and is asked to reconstruct the distri- 
bution that generated them. Although the two fields have 
traditionally been developed separately, recently the con- 
nections and similarities have been highlighted, see e.g. 

A paradigmatic scenario in this direction is the Inverse 
Ising problem, that is finding the couplings of an Ising 
model given data e.g. mean magnetizations and pairwise 
correlations. Two strands of questions in biology have re- 
cently motivated this problem. The first one comes from 
Neuroscience. While traditionally it has only been pos- 
sible to record simultaneously from a few neurons at a 
time, for special cases, e.g. retinal cells recordings from 
hundreds of neurons are now possible, and techniques al- 
lowing for many more are on the horizon. The problem 
is then to infer functional connectivities between neurons 
from these recorded multi-neuronal spike trains. P^[^[T51 
\TT\ . Second, global gene expression measurements by e.g. 
microarray technologies have been around for more than a 
decade, and a standard way to analyse them is through co- 



expression, i. e. correlation of expression of genes or groups 
of genes across different conditions. Correlation is not cau- 
sation. If, in fact, the gene expression measurements are 
snap-shots of a probability distribution generated by an 
Ising model, then the most significantly coupled genes will 
be the ones most strongly coupled in the Ising model, not 
the most strongly correlated, and this is then another way 
to classify genes as similarly or not similarly expressed [1] . 

The inverse Ising problem is a difficult combinatorial 
optimization problem in the class known as "NP-hard" . In 
theory, only approximate schemes, or methods that take 
more than polynomial time to find the answer are possible. 
Boltzmann Learning [1] is an iterative method where in 
one step the correlation functions are computed given an 
Ising model, and in another step the Ising model couplings 
are modified to adjust to data. In principle, Boltzmann 
learning can be employed to find the couplings with arbi- 
trary accuracy given accurate data and sufficient time, but 
the slow convergence of the Boltzmann learning makes it a 
very inefficient algorithm for most practical purposes. The 
main approximate schemes using means and correlations 
are inversion of the correlation matrix ( "naive mean- field 
theory") as used in [i], Thouless- Anderson-Palmer (TAP) 
formula [31IT8] , Independent Pairs Approximation [T3| and 
the perturbative scheme of an auxiliary statistical mechan- 
ics model of Sessak and Monasson [TS] . If the sample data 
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can be used as such (and not only via means and pair- 
wise correlations), a linear regression can be performed 
on a link- by-link basis, which is quite powerful when the 
underlying matrix of couplings in sparse jlOj . 

In this paper, we study the recently introduced Suscep- 
tibility Propagation (SusP), an approximate scheme also 
based on means and correlations [7]. SusP is a message- 
passing algorithm which is derived by using the Fluctuation- 
Response theorem to the update rules of Belief Propaga- 
tion. Although there are particular features specifically 
for Ising spins, on one level, and more generally, SusP 
can be considered as Boltzmann Learning scheme, where 
the correlation functions are computed by Belief Propaga- 
tion instead of (much more slowly) by Monte Carlo. Belief 
Propagation is based on messages exchanged between each 
pair of nodes [5D] . SusP uses such messages as well as the 
derivatives of these messages with respect to the field at a 
third node, yielding in the end messages involving triplets 
of nodes. SusP is fairly sensitive to the accuracy by which 
the correlation functions are known [5j|9| . 

Our contributions in this paper are as follows. In the 
paradigmatic Sherrington-Kirkpatrick model, also studied 
in [7] and [5], we provide numerical evidence for a phase 
transition between reconstructible and non-reconstructible 
phase, relative to the SusP algorithm. Perhaps surpris- 
ingly, this transition is not at the spin glass transition 
{Tc — 1) but some distance into the disordered param- 
agnetic phase {TsusP ~ 4). Wc show that in the non- 
reconstructible but still disordered phase, SusP almost 
converges, in the sense that trajectories of the updates 
according to SusP come close to a marginally unstable 
fixed point, and spend a long time in the neighborhood of 
that fixed point before eventually diverging. We introduce 
a heuristic stopping criterium for SusP in this region, and 
with a reasonable criterion for quantitative reconstruction 
we are able to push the threshold for (approximate) recon- 
struction down to TsusP' ~ 2). We also investigate how 
the performance of SusP on a Sherrington-Kirkpatrick 
model depends on external field (magnetization). Going 
beyond the Shcrrington-Krirkpatrick model, we consid- 
ered various sparse models, where many (or most) of the 
potential couplings are zero. In particular, we show that 
SusP works very well on a randomly diluted Sherrington- 
Kirkpatrick model, and in this scenario clearly outper- 
forms other approximate schemes. Furthermore, we show 
that, given a fixed sparsity, the reconstruction is better on 
a randomly connected graph compared to a regular lattice. 
Finally, we show that for in silico neural data for which 
TAP inversion and Sessak-Monasson approximations work 
well [OlfTI] . SusP appears to be out-performed by these 
two simpler schemes. 



2 General Setting 

Susceptibility propagation is a fairly complicated algo- 
rithm, which, for completeness, we describe in Appendix. 
To be able to refer to a definite formula, let us however 
state that one central step in SusP is the inversion 



Jij = tanh 



Cij — tanh hi^j tanh hj^ 
1 — Cij tanh hi^j tanh /ij_ 



(1) 



where hi-^j and hj-^i are Belief Propagation messages and 
Cij are auxilary quantities encoding gradients of messages, 
the observed correlation between spins i and j, as well as 
the observed magnetizations of spin i and spin j. After an 
update of the Jij the messages and their gradients are up- 
dated according to equations given in Appendix, where the 
Jij enter parametrically, and this procedure is repeated 
until convergence. When this procedure converges, and 
how accurate it is when it does are important points to 
understand here. In this Section we describe how we adress 
these two points in this paper. 

The overall idea is to compute mean magnetizations 
and pairwise correlations from a known Ising model and 
use them to reconstruct the model back. We can then il- 
lustrate reconstruction by a scatter-plot of the inferred 
couplings Jf,l^^^ versus the (known) true couplings cou- 
plings Jij . A straight line scatter plot of slope one indi- 
cates a successful reconstruction. This can be quantified 
by the correlation coefficient of the scatter plot (i?), and 
the P-value of a the null hypothesis that the reconstructed 
couplings are equal to the true ones. Alternatively, we can 
quantify the quality of reconstruction in one of our test 
cases by the Zi-measure of [71[S]: 



A^ 



1 



std(j,,)\nv(iv-i) 



T)T.yfr 



/■■12 



(2) 



Kj 



The first test case is the Sherrington-Kirkpatrick model 
[16j of N spins, cr ~ (cti, . . . ,<7n) and Boltzmann distri- 
bution 



Pr(cr) = — exp 



i i<j 



(3) 



in which /3 — 1/T is the inverse temperature and the cou- 
plings Jij are drawn from a zero mean Gaussian distribu- 
tion with variance 1/A^. This case has already been con- 
sidered by [7] , where it was shown that SusP outperforms 
several other reconstruction methods in the high temper- 
ature and zero external field regime (small /3, all hi zero), 
and also by [5] where it was shown that SusP is sensitive 
to noise in the correlation functions. Our contribution, 
for this family of test cases, is a precise determination of 
the threshold in the low temperature regime where SusP 
ceases to converge (in Section |31), and an extension of the 
analysis to the case with non-zero external fields (in Sec- 
tion U) . We further introduce a modification of the SusP 
stopping criterion pushing the boundary of (approximate) 
reconstruction to lower temperatures. 

Motivated by the "folklore theorem" that message pass- 
ing techniques work best on locally tree-like graphs, we 
also study the effect of network geometry on the conver- 
gence of SusP. The underlying graph of the SK model is 
fully connected with many loops, so this is presumably a 
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rather difficult case for SusP. We therefore consider ran- 
domly diluted SK models, where a fraction (1 — c) of the 
couplings to each node are set to while cN randomly 
picked couplings are drawn from a Gaussian distribution 
with variance l/{cN) or \/N . In the same spirit, but in 
the opposite direction, we also consider SusP on a lat- 
tice graph, where each spin is connected to its cN nearest 
neighbors. 

Finally, we apply SusP to infer functional couplings 
in data generated from a simulated neural networks. Us- 
ing inverse models to describe the statistics of neural data 
has been one of the very active fields of research lately [HI 
[T7lfT2l [2| and various approximations from statistical me- 
chanics have been studied for the inference of Ising model 
applied to neural data [13], showing that TAP inversion 
and Sessak-Monasson approximations work very well for 
this type of data. Here we show that for fine time bines 
( 10ms), SusP is outperformed by these simpler approxi- 
mations. 

To end this Section: for small smaples, i.e. N < 20, we 
can calculate the means, mi = N~^ J^cr Pr(cr)CTi, and cor- 
relations Cij = N^^ J2(T Pr(''')'''i'''j —rnirrij by exact enu- 
meration and summing over all the 2^ states. For larger 
values of N, we use Monte Carlo sampling to estimate the 
means and correlations since the exact summation is not 
feasible. For larger values of N the means and correlations 
are therefore always more or less noisy. 



3 Convergence at Low T 

In this Section we study convergence of the SusP algo- 
rithm on the test case of an SK model at zero external 
field. Temperature, as in Eq. ^, should be understood 
as a shorthand of the overall size of the couplings. It is 
well-known that this model has a phase transition to a 
spin glass phase at Tc = 1. When the temperature is high, 
the SusP algorithm can find the couplings very accurately, 
with a reconstruction error that is essentially limited by 
the quality of the measured means and correlation and the 
machine accuracy. Wc do not show these data, already ob- 
tained by [S] , but only point out that in the high temper- 
ature regime other reconstruction methods also work, and 
importantly the high temperature TAP inversion tech- 
nique would work well [5111^. Therefore, SusP can only 
be a competitive reconstruction method at high temper- 
ature if both the quality criteria are very strict, and also 
the means and correlations are known very accurately. 

The interesting case is, therefore, at low temperatures, 
where most reconstruction techniques have difficulties, and 
where the SK model approaches the spin-glass phase. One 
way in which the procedure sketched above and embodied 
by Eq. ([T]) can definitely fail to converge is if the argument 
of the right hand side of Eq. (|T|) eventually becomes larger 
than one in absolute value. If so, inversion is not possible 
(imaginary Jij). A convenient empirical tool to monitor al- 
gorithm convergence and performance on families of large 
instances is a pivoting plot. A number of random instances 
are generated and the convergence or performance of the 






o 
U 



- 


li 


/r 
I.' 






1 ■'' 

." 

1 '' 








/•'/ 


N=10 




- 


/ ■' ' ; 
/.'If 


N=15 -■-■ 

N=30 

N=60 











Temperature T 

Fig. 1. Convergence of SusP with no stopping criterion or 
damping factor. A phase transition seems to occur around 
TsusP ~ 4. We can conjecture that at higher A'' the phase 
transition will be sharper. 



algorithm is determined across the family. The fraction of 
good outcomes then varies with the size of the instances 
and a parameter describing the family. In the case at hand, 
this parameter is the temperature T, and the size is N. In 
favorable cases the location of the transition between high 
and low fractions of good outcomes depend only weakly 
on A'^ and becomes sharper a N grows, giving rise to a 
characteristic "pivoting" shape of the cumulative empiri- 
cal probability distributions. 

As shown in Fig. [U for SusP on the SK model, there 
is clear pivoting and a threshold separating a convergent 
from a non-convergent phase. This transition occurs at 
TsusP ~ 4. To avoid this convergence problem, the update 
rule can be modified with a damping factor [5] . There are 
two problems, however, with the low-T convergence even 
when the damping factor is used. First, the damping fac- 
tor makes the convergence slow for low T because strong 
damping e must be employed. Second, we have also ob- 
served that for sufficiently low T, although the imaginary 
J problem can be avoided by using the damping factor, 
the algorithm fails to converge even with very small e. 

The behavior of the algorithm on representative in- 
stances are exemplified in Fig. ^ where we show total re- 
construction error as well as the dynamics of one example 
coupling versus the number of iterations used in the algo- 
rithm. It is clearly seen that even for small T, where the 
algorithm will eventually diverge, there is a plateau where 
the couplings remain almost constant for many iterations. 
In fact, comparing with the dynamics of J at larger T, 
where convergence does occur, we see that the dynamics 
is then qualitatively similar, only the plateau then be- 
ing also the asymptotically steady state. In other words, 
the dynamics of the SusP algorithm close to the solution 
has a direction which changes from marginally stable to 
marginally unstable as T decreases, and this can be seen 
to be the cause of bad reconstruction in this regime. Con- 
sequently, by stopping the algorithm when changes in J 
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Fig. 2. The reconstruction error as the algorithm proceeds 
(a), and the dynamics of one particular coupling Joi (b) as 
the algorithm proceeds, and for different temperatures. The 
stars show where the stopping criterion Eq. (|4]) is satisfied and 
the associated scatter plots show the inferred couplings versus 
those of the original model. 



from one iteration to the next are small, long before the 
argument of the right hand side of Eq. ([T]) turns negative, 
we can hope to achieve good reconstruction also at very 
low temperatures. 

This is indeed possible, as shown in the scatter plots in 
Fig. [5] where we have shown the inferred couplings versus 
true couplings after t iterations where 



\J'.-j'-^\>\j'-^-j'-''\ 



for at least 90% of the pairs of i and j. Fig. [3l shows 
how according to the criterion of at most 5% reconstruc- 
tion error (measured by A) reconstruction is possible at 
lower temperatures, down to new threshold TsusP' ap- 
proximately equal to 2. 



4 Influence of varying the external field and 
mean magnetization 

The results shown in the previous section as well as those 
reported in [S] were all run on data from a model with- 
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out external field, i.e. when hi = for all i. In this case, 
because of the symmetry of the problem the mean magne- 
tizations are all zero (if calculated exactly), and fluctuate 
around zero when computed by Monte Carlo sampling. 
This raises the question of the effect of magnetization and 
external fields on the reconstruction error. 

As described in Appendix, SusP first computes the 
couplings in a loop of alternate steps of changing J^ and 
Belief Propagation updates; the external fields hi are only 
computed in a Belief Propagation output step. Similarly to 
other schemes such as TAP, reconstruction of the external 
fields can therefore essentially never be better than the re- 
construction of the couplings. The questions are therefore, 
first, how well the couplings are reconstructed if the mag- 
netizations are sensibly different from zero, and second, if 
reconstruction quality is degraded in the final step from 
the couplings to the external fields. This is an important 
issue to investigate since in many real life applications, 
e.g. neural data binned at smal time bins, one would be 
dealing with highly magnetized variables. 

On the first point, in Fig. 21 we show that reconstruc- 
tion of the couplings by SusP is degraded, either as a func- 
tion of external fields, or as function of mean magnetiza- 
tion. Here we have used a uniform external field applied 
to all spins. As a function of external fields the loss of 
reconstruction quality is smooth, while as a function of 
magnetization it is abrupt, and concentrated in a narrow 
range close to |rTi| w 1, essentially due to the fact that as 
|r7T,| — > 1 one has h = tanh~^(TO) [H]. The dependence 
is sharper at higher temperature, i.e. when SusP in zero 
magnetization works well. 

On the second point, we do not observe significant 
degradation going from the couplings to the external fields. 
As shown in Fig. [5j the quality of this reconstruction also 
degrades at lower tepmperatures, as is the case for the 
couplings. 
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Fig. 4. Reconstruction error as a function of external field (a) 
and mean magnetization (b) . At all temperatures and network 
sizes tested, increasing the absolute magnitude of the magne- 
tization by increasing the external field has a negative effect 
on the reconstruction. 
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5 Geometry and sparsity of the graph 

SusP has its root in the Belief Propagation algorithm. It 
is well-known that the BP is exact on trees, i.e. graphs 
without loops [20]. We would then expect that the per- 
formance of SusP is also influenced by the presence or 
absence of loops in the model that generated the data. 

To test this hypothesis, we first studied how the spar- 
sity of the connectivity pattern of the model influences the 
reconstruction. We thus generated data from a SK model 
in which a fraction c of the couplings are put to zero while 
the rest are drawn either from a Gaussian distribution 
with variance 1/N, or with l/{cN). The results are shown 
in Fig. [S] where we plot the reconstruction error versus 
the sparsity c. We can observe a decrease in the recon- 
struction error as c decreases, thus concluding that SusP 
works better when the connectivity is sparse. For almost 
all values of c, the positive effect of a sparser graph on the 
reconstruction error is suppressed when one has stronger 
connections, i.e. when the variance of the couplings are 
taken to be l/{cN). 

To understand how the graph geometry influences SusP, 
we altered the geometry of connections, at fixed c, from 
a random connectivity to a 2D rectangular lattice with 
each node being connected to cN other closest nodes to 
it. This way, at a fixed sparsity, we essentially increase the 
loopiness of the graph, by inducing local small loops. As 
shown in Fig. [H] the presence of these local loops increases 
the reconstruction error. For a given connectivity, a ran- 
dom sparse graph has fewer local loops than the lattice 
one, which explains why the reconstruction is better on 
random sparse than on a lattice. 



6 Tests on Neural Data 

The Ising model has been used in a number of studies for 
modeling the statistics of muti-ncuron spiking patterns of 
binned spike trains. They were shown to provide a good 
model for spike patterns over N neurons, if the average 
number of spikes generated by all N neurons in a time bin 
is small |14[|12| . Various approximations have been studied 
in [T51[TTj where TAP inversion and SM approximations 
were shown to provide good estimates of the functional 
couplings. In what follows we study the performance of 
SusP on synthetic neural data generated form a simulated 
cortical network model and compare them with TAP. For 
details of the simulations see [13] . 

Fig.[7]shows the results of TAP, and SusP as compared 
to the results of Boltzmann learning for neural populations 
of size iV = 40 and A'' = 100. This figure shows that for 
this type of data TAP outperforms SusP, particularly for 
large N. One reason for this could be the fact that to 
make a binary representation of the spike trains, one has 
to bin them to small time bins, typically of size St = 10ms. 
This yields a mean probability of spike per bin of ~ 0.9 
or mean magnetization of ~ —0.7. Our previous analysis 
in section |4] did provide the conclusion that large absolute 
magnetization has a negative effect on the performance of 
SusP. 
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Fig. 6. The influence of the sparsity and connectivity pattern 
on SusP. (left) The reconstruction error is shown versus the 
fraction of connections for different sizes both in the case of 
a random graph and a 2D lattice. In this case the variance of 
the couplings scale as 1/7V independent of c. (right) This is 
the same as a, except that the variance of the couplings now 
scale as 1/cN. SusP appears to work better on sparser and less 
loopy graph. 



7 Discussion 

The ability of experimentalists to observe the activity of 
a large number of elements in biological networks needs 
to be accompanied by mathematical tools to analyse the 
high dimensional recorded data. One approach for doing 
this it to develop analytical and numerical tools to in- 
fer a probability distribution with a small number of pa- 
rameters from the recorded data, small compared to the 
possible number of states of the system. The fitted model 
can then be used to generate synthetic data, or it could be 
used to learn something about the network that generated 
the data, i.e. to learn functional connections between the 
elements. 

The Ising model is probably too simple a model for 
many real life purposes and the inferred couplings may 
not correspond to real physical interactions, depending 
on the real underlying physical system |13| . However, it 
provides an excellent platform to study the analytical and 
theoretical aspects of the problem. Many of the hurdles 
encountered in the inverse Ising problem are likely to be 
present when dealing with more complicated models. De- 
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Fig. 7. Comparing SusP (left) and TAP results (right) versus 
the Boltzmann learning for A'' = 40 (up) and N = 100 (down) 
for synthetic neural data binned at 5t = 10 ms. 



veloping and analyzing approximate and iterative meth- 
ods for inverse Ising problems has therefore attracted a 
lot of attention in the past few years. 

The SusP algorithm is the result of the last effort in 
this direction. Not surprisingly, it exhibits complex dy- 
namical behaviors when applied to loopy graphs some 
of which we studied here. Importantly, the simplest im- 
plementation of the algorithm exhibits a dynamic phase 
transition to a non-convergent regime at TsusP ~ 4, i.e. 
higher than the equilibrium phase transition of the SK 
model. Our numerical experiments show that for a range 
of temperatures below TsusP the lack of convergence of 
the algorithm is due to the presence of an unstable di- 
rection in the trajectory of the couplings, such that the 
algorithm gets close to the true couplings, stays there for 
a while and then moves away from it. Exploiting this fact, 
it is possible to perform good reconstructions down to 
TsusP' ~ 2. Whether it is possible to avoid such a di- 
rection all together requires a more detailed study of the 
dynamics of the couplings during the learning process. It 
would also be interesting to see whether loop correction 
methods such as those developed for Belief Propagation 
[5] could be also employed in SusP. 

Two new observations of the SusP algorithm have been 
made in this paper. One positive, albeit expected, is that 
SusP works much better (at large typical couplings) on 
sparse graphs. If it is known a priori that the underly- 
ing graph is sparse, then SusP should be very competitive 
to other schemes using means and correlations such as 
TAP or Sessak-Monasson. However, this is also the regime 
where the recent li reconstruction of Ravikumar, Wain- 
wright and LafFerty |10| provably works. More experimen- 
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tation is therefore needed to estabhsh if SusP is compet- 
itive in this regime. Our second observation, negative, is 
that reconstruction by SusP degrades in a strong exter- 
nal field. Qualitatively speaking, a strong external bias 
overwhelms the correlations since spins do not fluctuate 
much, and a finite amount of data on correlated fluctu- 
ations contains less useful actionable information for the 
SusP reconstruction. Since the high-field limit is relevant 
to neural data, this may limit the applicability of SusP 
in this domain. In tests on one series of synthetic neural 
data, we indeed found that SusP does not work better 
than simpler schemes such as TAP. 

SusP has already been extended and used for non- 
binary systems by Weigt et al [TH] . These authors extended 
the SusP algorithm to Potts-like variable and successfully 
inferred direct physical interactions between amino acids 
in a family of two-component signal transduction path- 
ways in bacteria. SusP has therefore proven its usefulness 
on a problem of significant biological interest. The conver- 
gence properties of this extended SusP in a suitable class 
of random models have however not been systematically 
investigated. 



A The Susceptibility Propagation Algorithm 

This appendix contains the update rules for the Suscep- 
tibility Propagation algorithm and their demonstration. 

and are derived 



The rules are stated in Subsection lA.li 

in Subsection lA.21 The relation to Boltzmann machines is 

described in Subsection lA^ 



The couplings Jij are derived in the update step. The 
external fields are computed from the converged messages 
and the magnetizations as 



hi ^ arctanh mi 






(6) 



A. 2 Deriving the SusP update rules 

As a starting point, we will use the canonical update equa- 
tions of Belief Propagation applied to the pairwise Ising 
model: 

In these two equation, Pi^j and Qj-^i are the messages 
exchanged over spins i and j, and di are the spins in the 
neighbourhood of i. Proportionality means up to a nor- 
malization e.g. Yl^^.Pi^jicTi) = 1 and Y^^, qj^i{ai) = 1. 
Finally, to retrieve the marginals we have the following 
equation: 

where proportionality again means up to a normalization. 
For Ising spins it is convenient to work with the log- 
likelihood notation 



A.l Susceptibility Propagation Equations 



h 



t^j 



1 ?W±i) 

2 P.^,(-l) 



Initialization: The messages hi^j and Vi^j^k are chosen 
at random while the other messages Uj^i and gi^j^k are 
set to 0. The couplings J^ are initially also set to 0. 



which satisfy 



'l-S.J 



1 ^,^,(+1) 

2 g.^,(-l) 



Update rules: 



'^i— >j — "■« 



J2 "*^- 

kGdi\j 



hi^j <— arctanh rrii 



k 






tanh J„- ■<— £■ 



Cij — tanh hi^j tanh hj^i 
1 — Cij tanhhi^j tanhhj^i 



tanh u. 



7->J 



tanh Jij tanh hi^j 



1 - tanh h^^j 

Vi^j,k ^ gi^j.k tanh Jij -2 

1 — tanh Ui 



H->J 



Output rules: 



(5a) 
(5b) 

(5c) 
(1 — e) tanh Jy 



and (in the other direction) 



tanh Ui^j = tanh Jji tanh hi^j 



This is Eq. ((5e)) of the SusP update rules. 
SusP works also with the gradient of messages 



54->i,fc 



dh 



1^1 



dhk ' 



Vt^j^k 



du^ 



-^3 



dhk 



Their update rules are derived by differentiating the up- 
(5d) date rules of the messages with respect to external fields, 
(5e) and give 

9i^j,k ~ 2^ Vl^i,k + 6i^k 

(5f) iedi\] 

and 



Vi^rj^k — 9i^rj,k 



tanh 7 ^ " ^""''^ ^'^' 



1 — tanh u. 



i^j 
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These arc Eqs. (|5b|) and ([5f|) of the SusP update rules. 

The log-hkeUhood quantities describing the marginals 
are the effective fields Hi ~ ^ log ^ (_iy Exact effective 
fields are related to magnetizations rrii and correlation 
functions Xij by 

nii = tanh Hi 

and (by fluctuation-dissipation) 

d tanh Hi 



Xij 



dh^ 



On the other hand, in Belief Propagation the effective 
fields are related to the log-likelihood messages as 



ff, 



jedi 



Combining the two, or, what is the same, taking the Belief 
Propagation computed effective fields as proxies for the 
exact effective fields, we have 



hi = tanh (mi) — y 



jedi 

which is Eq. ([SJ, the SusP output rule. Substituting hi 
from above into the update equation for /i,;_j.j we have 

hi^j = tanh"^(TOi) - Uj^i 

which is Eq. ([5a|) of the SusP update rules. 

If the correlation functions Xij ^-re expressed in terms 
of the partial derivatives of the effective fields, and the 
effective fields are computed by Belief Propagation, then 
we will show that 



using the shorthand 



a ~ tanh Jij b = tanh h^. 



tanh h-j 



such that 



1 + abc 



c + at 

^i = tanhu.,_ 

J 1 I „u„ J 



1 + abc 



Eq. ^ is an algebraic identity, as can verified by termwise 
comparison, which therefore shows Eq. ([7]) and Eq. ([5]). 

In the final step we express Jij as a function of Xij and 
all the other quantities. It is convenient to first introduce 
yet another auxiliary quantity 



9j^i,J 



(10) 



which is Eq. ([5c|) of the SusP update rules. In Eq. ([T0| 
the correlations x are from data. If, on the other hand, we 
substitute Eq. ^ and Eq. (O into Eq. dTU]) we have 



L-ii — 



tanh Jji + tanh hi^j tanh hj^ 



1 + tanh Jji tanh hi^j tanh hj^i 

which can be inverted to 

Cij — tanh hi^j tanh hj^i 



tanh J-i 



1 — Cij tanh hi^j tanh hj^i 



(11) 



(12) 



which is Eq. (|5dl) . the final update rule for SusP, and also 
the central step (Eq. ((!])) quoted in the bulk of the paper. 



A. 3 Relation of SusP to Boltzmann machines 



where the auxiliary quantities Xij are 

tanh Jji + tanh hi^j tanh hj^i 



Xij 



1 + tanh Jji tanh hi^j tanh h 



(7) 



(8) 



j^i 



These equations do not (in general) hold exactly, but only 
under the assumption that Belief Propagation is exact. If 
and when so, then by fiuctuation-dissipation 



Xij 



d tanh Hi drrii 



dh, 



dhj 



jedi "^J^i) 



dh 



i^-ml) 



d{h 



»->3 



J^'y 



dh, ^-^^ 

= (wj^jj +5»^jj)(l-m^) 

To show Eq. ([7]) and Eq. ([5]) we hence have to prove that 
Vj^i_j(l — ?Ti|) = Xij9j^i,j 1 which is the same as 



1 — a^c^ 



(l-( 



b + ac 2-. 
l + abc' ' 



a + bc c + ab b + ac 

V 

1 -|- abc 1 + abc 1 + abc 

(9) 



We recall that the Boltzmann machine approach consists 
of the iterative procedure 



SJtj = V {x 



Data 
ij 



X. 



ModclN 



(13) 



In practical applications this is often prohibitively ex- 
pensive, because the correlations xfj°'^'^^ have to be evalu- 
ated by Monte Carlo. Eq. ^ and Eq. ([5]) are a means to 

quickly (if approximately) evaluate the correlations Xij 
using Belief Propagation variables and gradients of Belief 
Propagation variables. 

The fixed point of a Boltzmann machine approach must 
satisfy the set of equations 



x: 



Data 



Model 
Xij 



iUv}) 



(14) 



Model 



where we have made explicit that the correlations x, 
depend on all the model parameters {Jij}- Eq. ([7]) is of 
just this form, and so is Eq. (jlip for the auxiliary quan- 
tities Cij . A standard way to solve such equations would 
be by the Newton method or other general-purpose root- 
finding methods. Eq. (|12p however means to solve each 
equation for Xij separately by varying the corresponding 
Jij, and keeping all else fixed. 

Iterating Belief Propagation until convergence and then 
inverting locally for Jij potentially wastes computation 



Erik Aurell et al.: Dynamics and Performance of Susceptibility Propagation on Synthetic Data 



time in iterations. The actual scheme of [7| described above 
is therefore to interleave Belief Propagation update steps 
(including updating the BP gradient quantities), and lo- 
cally solving for J^ . This explains the order of the oper- 
ations stated in lA.ll Convergence of this scheme is how- 
ever not an easy question to answer. Presumably there 
can be complications analogous to using Newton method 
in a high-dimensional situation (one has to have a good 
starting point), but as discussed in the bulk of the paper, 
there also seem to be other effects (eigenvalue cross-over 
before the spin glass transition). 
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